Skip to content

[Bugfix][Elasticsearch] jobmanager memory leak (#9160) #9161

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: dev
Choose a base branch
from

Conversation

WenDing-Y
Copy link

Purpose of this pull request

fix #9160

Does this PR introduce any user-facing change?

How was this patch tested?

The issue can only be reproduced in a Flink on K8s deployment environment, specifically in session mode with HA (High Availability) configuration. It is difficult to test locally.

Check list

@WenDing-Y WenDing-Y changed the title [Bugfix][Elasticsearch] leak jobmanager memory leak (#9160) [Bugfix][Elasticsearch] jobmanager memory leak (#9160) Apr 12, 2025
@davidzollo davidzollo added the First-time contributor First-time contributor label Apr 13, 2025
@davidzollo davidzollo requested a review from Copilot April 13, 2025 08:04
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

seatunnel-connectors-v2/connector-elasticsearch/src/main/java/org/apache/seatunnel/connectors/seatunnel/elasticsearch/source/ElasticsearchSourceSplitEnumerator.java:211

  • Consider verifying that the hosts list is not empty before calling hosts.get(0) to avoid a potential IndexOutOfBoundsException.
String endpoint = String.format("%s/_cat/indices/%s?h=index,docsCount&format=json", hosts.get(0), index);

Comment on lines -83 to -85
public void open() {
esRestClient = EsRestClient.createInstance(connConfig);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we can not create client in open method? Could you share more details?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After testing, I found that it is related to the CloseableHttpAsyncClient client in RestClient because this can cause the issue where although the file is deleted, the thread still holds an open handle to the temporary file.
in jobmanager

ls -l /proc/1/fd/ | grep deleted
lr-x------ 1 flink flink 64 Apr 14 05:44 381 -> /tmp/jm_00b72ebe540b0440ed4510730bb35a8b/blobStorage/job_fae4e51b34120030fbda6a238628bd3a/blob_p-3cfee1e57d60be6705eb5ca6458a32f526d30a3a-9b7914da78fe1a3c0389cdbbf9d1e481 (deleted)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean CloseableHttpAsyncClient has bug so that not close properly?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a factual perspective, yes. After replacing it with CloseableHttpClient, the #9160 was resolved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should raise a issue on ElasticSearch community or upgrade ElasticSearch dependency version to verify it fixed or not.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hisoka-X Can it be merged now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need more discuss. @hailin0 @corgy-w

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i found the issue about this bug HttpAsyncClient 5.0 file descriptor leak

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the last elasticsearch-rest-client version,the httpasyncclient version is 4.1.5

org.elasticsearch.client elasticsearch-rest-client 8.17.4

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nielifeng nielifeng requested a review from Copilot April 16, 2025 06:45
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

seatunnel-connectors-v2/connector-elasticsearch/src/main/java/org/apache/seatunnel/connectors/seatunnel/elasticsearch/source/ElasticsearchSourceSplitEnumerator.java:90

  • [nitpick] The open() (and close()) methods are now empty following the removal of esRestClient. Consider adding a clarifying comment in these methods to indicate that they are intentionally no-ops after the refactor.
public void open() {}

WenDing-Y and others added 4 commits April 16, 2025 15:09
…org/apache/seatunnel/connectors/seatunnel/elasticsearch/source/ElasticsearchSourceSplitEnumerator.java


fix statusCode!=SC_OK

Co-authored-by: Copilot <[email protected]>
@nielifeng nielifeng requested a review from Copilot April 17, 2025 01:19
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request fixes a memory leak issue in the Elasticsearch connector by refactoring the usage of HTTP clients and removing the persistent EsRestClient object in the source split enumerator.

  • Removed the EsRestClient usage and now creates a new HTTP client per request via HttpClientUtil.
  • Updated open() and close() methods to be empty and replaced them with a dedicated getIndexDocsCount() method for fetching data.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
seatunnel-connectors-v2/connector-elasticsearch/src/main/java/org/apache/seatunnel/connectors/seatunnel/elasticsearch/util/HttpClientUtil.java Introduces a helper for creating SSL-configured HTTP clients.
seatunnel-connectors-v2/connector-elasticsearch/src/main/java/org/apache/seatunnel/connectors/seatunnel/elasticsearch/source/ElasticsearchSourceSplitEnumerator.java Removes the EsRestClient instance and adds a getIndexDocsCount() method for index data retrieval.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] [connector-elasticsearch] lead jobmanager memory leak
3 participants